Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal and Multimodal Representations
نویسندگان
چکیده
Learning effective joint embedding for cross-modal data has always been a focus in the field of multimodal machine learning. We argue that during fusion, generated may be redundant, and discriminative unimodal information ignored, which often interferes with accurate prediction leads to higher risk overfitting. Moreover, representations also contain noisy negatively influences learning dynamics. To this end, we introduce bottleneck (MIB), aiming learn powerful sufficient representation is free redundancy filter out representations. Specifically, inheriting from general (IB), MIB aims minimal given task by maximizing mutual between target simultaneously constraining input data. Different IB, our regularizes both representations, comprehensive flexible framework compatible any fusion methods. develop three variants, namely, early-fusion MIB, late-fusion complete on different perspectives constraints. Experimental results suggest proposed method reaches state-of-the-art performance tasks sentiment analysis emotion recognition across widely used datasets. The codes are available at https://github.com/TmacMai/Multimodal-Information-Bottleneck.
منابع مشابه
Multimodal Versus Unimodal Instructions
This module provides an overview of multimodal perception, including information Your nose might even be stimulated by the smell of burning rubber or gasoline. In other words, how does the perceptual system determine which unimodal between the two balls that then bounce off each other in opposite directions. Principles and heuristics for designing minimalist instruction. H Van der Multimodal ve...
متن کاملLearning Stimulus-Location Associations in 8- and 11-Month-Old Infants: Multimodal versus Unimodal Information.
Research on the influence of multimodal information on infants' learning is inconclusive. While one line of research finds that multimodal input has a negative effect on learning, another finds positive effects. The present study aims to shed some new light on this discussion by studying the influence of multimodal information and accompanying stimulus complexity on the learning process. We ass...
متن کاملImage Pivoting for Learning Multilingual Multimodal Representations
In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot b...
متن کاملUnimodal & Multimodal Biometric Recognition Techniques A Survey
Biometric recognition refers to an automatic recognition of individuals based on a feature vector(s) derived from their physiological and/or behavioral characteristic. Biometric recognition systems should provide a reliable personal recognition schemes to either confirm or determine the identity of an individual. These features are used to provide an authentication for computer based security s...
متن کاملEstablishing and maintaining perceptual coherence: unimodal and multimodal evidence
How does a listener find and follow the speech of a talker? Many classic and contemporary accounts of speech perception start with a coherent sensory sample of speech already established, as if the perceptual world consisted solely of speech, and as if no more than a single talker ever spoke at once. Long-established and recent characterizations alike have cast the fundamental problem of speech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Multimedia
سال: 2022
ISSN: ['1520-9210', '1941-0077']
DOI: https://doi.org/10.1109/tmm.2022.3171679